Chapter 1 continued: Association and Experiments

Overview

  • 1.1 Case study: Using stents to prevent strokes
  • 1.2 Data basics
    • 1.2.1 Observations, variables, and data matrices
    • 1.2.2 Types of variables
    • 1.2.3 Relationships between variables
    • 1.2.4 Explanatory and response variables
    • 1.2.5 Observational studies and experiments

1.2.3 Relationships between variables

Homeownership rate and multi-unit structures

  • homeownership: the percentage of homes that are owned by residents
  • multi-unit: the percentage of housing units that are in multi-unit structures (e.g., apartments, condos)

Are these two variables related?

name state homeownership multi_unit
Autauga County Alabama 77.5 7.2
Baldwin County Alabama 76.7 22.6
Barbour County Alabama 68.0 11.1
Bibb County Alabama 82.9 6.6
Blount County Alabama 82.0 3.7
Bullock County Alabama 76.9 9.9
Butler County Alabama 69.0 13.7
Calhoun County Alabama 70.7 14.3
Chambers County Alabama 71.4 8.7
Cherokee County Alabama 77.5 4.3
Chilton County Alabama 75.1 4.4
Choctaw County Alabama 85.6 3.9
Clarke County Alabama 80.0 6.3
Clay County Alabama 72.8 11.2
Cleburne County Alabama 74.9 5.3
Coffee County Alabama 69.7 13.6
Colbert County Alabama 73.5 12.3
Conecuh County Alabama 81.6 6.0
Coosa County Alabama 83.7 1.9
Covington County Alabama 74.0 6.1
Crenshaw County Alabama 67.8 9.2
Cullman County Alabama 74.7 8.5
Dale County Alabama 61.2 13.2
Dallas County Alabama 62.6 16.0
DeKalb County Alabama 77.5 6.4
Elmore County Alabama 77.6 7.0
Escambia County Alabama 73.5 7.8
Etowah County Alabama 73.0 11.9
Fayette County Alabama 76.0 7.9
Franklin County Alabama 69.2 10.4
Geneva County Alabama 71.6 6.6
Greene County Alabama 71.1 11.1
Hale County Alabama 74.3 6.1
Henry County Alabama 81.9 3.2
Houston County Alabama 67.4 15.2
Jackson County Alabama 76.6 5.8
Jefferson County Alabama 66.8 24.0
Lamar County Alabama 75.1 9.0
Lauderdale County Alabama 73.0 14.7
Lawrence County Alabama 78.7 5.1
Lee County Alabama 64.2 23.3
Limestone County Alabama 77.1 9.4
Lowndes County Alabama 75.4 7.0
Macon County Alabama 68.0 15.5
Madison County Alabama 70.4 21.4
Marengo County Alabama 73.5 8.3
Marion County Alabama 75.8 10.9
Marshall County Alabama 72.5 9.3
Mobile County Alabama 68.4 17.7
Monroe County Alabama 73.8 6.0
Montgomery County Alabama 63.2 22.4
Morgan County Alabama 73.1 13.7
Perry County Alabama 67.8 11.4
Pickens County Alabama 74.1 10.1
Pike County Alabama 56.3 18.7
Randolph County Alabama 75.9 4.8
Russell County Alabama 62.3 17.8
St. Clair County Alabama 82.2 5.5
Shelby County Alabama 80.6 11.4
Sumter County Alabama 68.3 14.5
Talladega County Alabama 73.0 9.6
Tallapoosa County Alabama 73.3 8.9
Tuscaloosa County Alabama 63.3 25.4
Walker County Alabama 77.7 6.6
Washington County Alabama 83.0 2.6
Wilcox County Alabama 76.8 6.0
Winston County Alabama 73.8 6.1
Aleutians East Borough Alaska 59.2 11.8
Aleutians West Census Area Alaska 36.3 30.9
Anchorage Municipality Alaska 61.7 35.3
Bethel Census Area Alaska 61.3 13.9
Bristol Bay Borough Alaska 56.6 13.4
Denali Borough Alaska 60.7 14.1
Dillingham Census Area Alaska 60.7 14.1
Fairbanks North Star Borough Alaska 59.8 26.2
Haines Borough Alaska 74.5 13.2
Hoonah Angoon Census Area Alaska 64.0 8.6
Juneau City and Borough Alaska 64.0 32.2
Kenai Peninsula Borough Alaska 72.7 12.0
Ketchikan Gateway Borough Alaska 59.1 36.7
Kodiak Island Borough Alaska 59.2 25.9
Lake and Peninsula Borough Alaska 75.0 2.5
Matanuska-Susitna Borough Alaska 79.2 10.1
Nome Census Area Alaska 56.2 17.4
North Slope Borough Alaska 48.3 24.9
Northwest Arctic Borough Alaska 53.7 19.4
Petersburg Borough Alaska 76.7 9.5
Prince of Wales-Hyder Census Area Alaska 69.0 9.7
Sitka City and Borough Alaska 55.9 24.4
Skagway Alaska 59.1 27.2
Southeast Fairbanks Census Area Alaska 65.2 17.3
Valdez-Cordova Census Area Alaska 71.8 17.6
Kusilvak Census Area Alaska 64.8 4.1
Wrangell Alaska 78.7 11.9
Yakutat City and Borough Alaska 61.1 12.4
Yukon-Koyukuk Census Area Alaska 69.1 2.9
Apache County Arizona 76.3 5.2
Cochise County Arizona 69.0 12.2
Coconino County Arizona 61.2 18.9
Gila County Arizona 78.3 4.8
Graham County Arizona 72.0 7.7
Greenlee County Arizona 46.9 6.1
La Paz County Arizona 75.4 3.6
Maricopa County Arizona 66.3 25.1
Mohave County Arizona 71.5 9.8
Navajo County Arizona 72.5 7.0
Pima County Arizona 64.6 22.9
Pinal County Arizona 77.7 6.3
Santa Cruz County Arizona 71.0 17.6
Yavapai County Arizona 72.5 11.0
Yuma County Arizona 69.6 12.5
Arkansas County Arkansas 64.4 10.9
Ashley County Arkansas 72.4 8.5
Baxter County Arkansas 76.6 10.1
Benton County Arkansas 70.1 15.6
Boone County Arkansas 72.7 12.1
Bradley County Arkansas 70.2 8.0
Calhoun County Arkansas 82.1 2.4
Carroll County Arkansas 69.5 11.6
Chicot County Arkansas 69.6 10.0
Clark County Arkansas 67.6 15.9
Clay County Arkansas 74.0 8.1
Cleburne County Arkansas 77.9 5.7
Cleveland County Arkansas 78.0 3.1
Columbia County Arkansas 69.8 11.4
Conway County Arkansas 75.8 6.3
Craighead County Arkansas 61.2 20.0
Crawford County Arkansas 73.0 10.8
Crittenden County Arkansas 58.2 23.1
Cross County Arkansas 70.7 11.3
Dallas County Arkansas 70.5 7.9
Desha County Arkansas 59.1 17.2
Drew County Arkansas 67.5 7.7
Faulkner County Arkansas 66.4 17.3
Franklin County Arkansas 78.8 4.7
Fulton County Arkansas 79.6 3.4
Garland County Arkansas 70.1 16.7
Grant County Arkansas 80.3 2.4
Greene County Arkansas 65.7 13.5
Hempstead County Arkansas 68.4 10.2
Hot Spring County Arkansas 76.0 4.4
Howard County Arkansas 69.6 8.2
Independence County Arkansas 72.8 7.3
Izard County Arkansas 79.7 4.5
Jackson County Arkansas 69.8 15.0
Jefferson County Arkansas 64.4 14.8
Johnson County Arkansas 68.8 11.3
Lafayette County Arkansas 79.2 5.4
Lawrence County Arkansas 67.2 8.1
Lee County Arkansas 66.3 13.9
Lincoln County Arkansas 69.3 8.4
Little River County Arkansas 71.2 9.4
Logan County Arkansas 79.1 4.0
Lonoke County Arkansas 74.4 8.6
Madison County Arkansas 75.3 3.6
Marion County Arkansas 81.6 4.7
Miller County Arkansas 66.3 18.7
Mississippi County Arkansas 59.9 16.8
Monroe County Arkansas 61.4 16.1
Montgomery County Arkansas 82.8 2.5
Nevada County Arkansas 71.3 5.9
Newton County Arkansas 79.7 4.0
Ouachita County Arkansas 69.8 10.5
Perry County Arkansas 81.7 2.3
Phillips County Arkansas 54.7 15.7
Pike County Arkansas 74.8 6.1
Poinsett County Arkansas 66.4 11.5
Polk County Arkansas 77.4 3.7
Pope County Arkansas 69.7 14.1
Prairie County Arkansas 72.6 5.8
Pulaski County Arkansas 60.4 24.9
Randolph County Arkansas 76.6 6.8
St. Francis County Arkansas 58.7 15.2
Saline County Arkansas 77.7 7.8
Scott County Arkansas 75.1 6.4
Searcy County Arkansas 75.0 4.7
Sebastian County Arkansas 63.4 22.3
Sevier County Arkansas 74.2 6.1
Sharp County Arkansas 80.8 3.6
Stone County Arkansas 80.4 1.4
Union County Arkansas 71.2 8.8
Van Buren County Arkansas 78.5 9.9
Washington County Arkansas 56.3 30.1
White County Arkansas 69.0 13.8
Woodruff County Arkansas 61.5 12.9
Yell County Arkansas 70.1 7.3
Alameda County California 55.1 38.0
Alpine County California 73.4 38.4
Amador County California 77.3 7.5
Butte County California 61.2 19.7
Calaveras County California 78.8 3.7
Colusa County California 64.4 14.3
Contra Costa County California 69.5 23.8
Del Norte County California 60.9 14.6
El Dorado County California 76.5 12.1
Fresno County California 55.0 25.8
Glenn County California 67.5 11.4
Humboldt County California 57.6 18.9
Imperial County California 56.6 21.4
Inyo County California 64.0 13.1
Kern County California 61.4 18.1
Kings County California 56.0 18.5
Lake County California 67.1 7.8
Lassen County California 63.7 9.7
Los Angeles County California 48.2 41.8
Madera County California 63.0 12.0
Marin County California 64.0 27.0
Mariposa County California 70.0 8.0
Mendocino County California 62.8 13.1
Merced County California 55.9 17.3
Modoc County California 70.2 5.3
Mono County California 56.4 51.1
Monterey County California 53.4 26.6
Napa County California 65.1 19.7
Nevada County California 74.0 9.5
Orange County California 60.8 33.7
Placer County California 72.9 17.1
Plumas County California 65.6 6.6
Riverside County California 70.0 16.1
Sacramento County California 59.5 26.9
San Benito County California 64.3 14.3
San Bernardino County California 65.1 18.8
San Diego County California 55.9 35.5
San Francisco County California 37.5 66.6
San Joaquin County California 61.7 18.7
San Luis Obispo County California 61.4 17.7
San Mateo County California 61.1 32.4
Santa Barbara County California 54.1 29.5
Santa Clara County California 59.2 32.8
Santa Cruz County California 59.6 21.4
Shasta County California 66.0 16.0
Sierra County California 80.1 4.5
Siskiyou County California 65.2 14.4
Solano County California 65.8 21.4
Sonoma County California 62.4 18.8
Stanislaus County California 62.1 16.5
Sutter County California 61.8 20.0
Tehama County California 65.1 11.6
Trinity County California 73.5 7.1
Tulare County California 59.3 14.4
Tuolumne County California 70.2 8.6
Ventura County California 66.4 20.2
Yolo County California 54.1 30.6
Yuba County California 59.8 17.8
Adams County Colorado 68.4 23.9
Alamosa County Colorado 63.2 20.3
Arapahoe County Colorado 65.9 33.7
Archuleta County Colorado 82.9 17.0
Baca County Colorado 74.9 4.4
Bent County Colorado 67.4 8.9
Boulder County Colorado 63.9 28.2
Broomfield County Colorado 74.4 21.7
Chaffee County Colorado 76.9 7.4
Cheyenne County Colorado 80.5 5.6
Clear Creek County Colorado 81.3 11.8
Conejos County Colorado 75.7 4.1
Costilla County Colorado 74.6 7.3
Crowley County Colorado 74.0 6.5
Custer County Colorado 80.5 4.7
Delta County Colorado 74.3 5.3
Denver County Colorado 52.5 44.9
Dolores County Colorado 78.3 1.7
Douglas County Colorado 82.5 14.9
Eagle County Colorado 65.3 36.3
Elbert County Colorado 91.3 1.4
El Paso County Colorado 66.6 22.4
Fremont County Colorado 76.6 10.0
Garfield County Colorado 67.5 20.5
Gilpin County Colorado 71.8 11.3
Grand County Colorado 76.9 30.3
Gunnison County Colorado 59.1 28.8
Hinsdale County Colorado 83.5 3.6
Huerfano County Colorado 72.1 8.6
Jackson County Colorado 72.3 5.6
Jefferson County Colorado 71.9 24.7
Kiowa County Colorado 66.6 3.4
Kit Carson County Colorado 69.3 10.7
Lake County Colorado 66.9 17.7
La Plata County Colorado 69.1 18.7
Larimer County Colorado 67.5 21.4
Las Animas County Colorado 69.5 12.0
Lincoln County Colorado 70.8 11.8
Logan County Colorado 68.3 15.0
Mesa County Colorado 72.3 14.8
Mineral County Colorado 86.4 0.2
Moffat County Colorado 75.1 14.0
Montezuma County Colorado 72.8 9.5
Montrose County Colorado 74.6 8.9
Morgan County Colorado 67.3 12.1
Otero County Colorado 66.1 13.3
Ouray County Colorado 74.3 8.1
Park County Colorado 87.9 1.3
Phillips County Colorado 73.3 7.6
Pitkin County Colorado 62.5 40.0
Prowers County Colorado 66.7 14.9
Pueblo County Colorado 69.9 16.2
Rio Blanco County Colorado 74.1 13.8
Rio Grande County Colorado 78.9 8.9
Routt County Colorado 74.1 28.8
Saguache County Colorado 68.3 8.1

Associated Variables

  • The multi-unit and homeownership rates are said to be associated because the plot shows a discernible pattern.
    • The downward trend means the variables are negatively associated.
  • When two variables show some connection with one another, they are called associated variables.
  • If two variables are not associated, then they are said to be independent. That is, two variables are independent if there is no evident relationship between the two.

1.2.4 Explanatory and response variables

Suppose that \(X\) and \(Y\) are associated variables.

  • If variable \(X\) helps us explain or predict the value of variable \(Y\), we say that \(X\) is the explanatory variable and \(Y\) is the response variable.
  • Sometimes (not always) the explanatory variable affects the response variable, i.e., the change in one variable causes a change in the other.
    • Example: Does the median household income in a county cause its population size to change?

1.2.5 Observational studies and experiments

There are two primary types of data collection: experiments and observational studies.

  • In an experiment, researchers put subjects in two or more groups and compare them.
    • In a randomized experiment, researchers randomly assign the groups. (e.g., stent study)
  • In an observational study, researchers collect data in a way that does not directly interfere with how the data arise. (e.g, county data set).
  • Beware: Association \(\neq\) Causation.
    • Observational studies cannot determine causation (e.g., TV’s predict life expectancy)
    • Well-designed randomized experiments can prove causation.

Group Discussion

Air pollution

Researchers collected data to examine the relationship between air pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specifically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM\(_{10}\)) in \(\mu g/m^3\). Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM\(_{10}\) and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births.

  1. Identify the main research question of the study.
  2. Who are the subjects (observational units) in this study, and how many are included?
  3. What are the variables in the study? Identify each variable as numerical or categorical.
  4. Which is the explanatory variable, and which is the response?
  5. Was this study an experiment or an observational study?

Researchers collected data to examine the relationship between air pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specifically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM\(_{10}\)) in \(\mu g/m^3\). Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM\(_{10}\) and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births.

Migraines and acupuncture

A migraine is a particularly painful type of headache, which patients sometimes wish to treat with acupuncture. To determine whether acupuncture relieves migraine pain, researchers conducted a randomized controlled study where 89 individuals who identified as female diagnosed with migraine headaches were randomly assigned to one of two groups: treatment or control. Forty-three (43) patients in the treatment group received acupuncture that is specifically designed to treat migraines. Forty-six (46) patients in the control group received placebo acupuncture (needle insertion at non-acupoint locations). Twenty-four (24) hours after patients received acupuncture, they were asked if they were pain free. Results are summarized in the contingency table below. Also provided is a figure from the original paper displaying the appropriate area (M) versus the inappropriate area (S) used in the treatment of migraine attacks.

Pain free?
Group No Yes
Control 44 2
Treatment 33 10
  1. What percent of patients in the treatment group were pain free 24 hours after receiving acupuncture?

  2. What percent were pain free in the control group?

  3. In which group did a higher percent of patients become pain free 24 hours after receiving acupuncture?

  4. What are the explanatory and response variables in this study? Classify each as numerical or categorical.

  1. Your findings so far might suggest that acupuncture is an effective treatment for migraines for all people who suffer from migraines. However, this is not the only possible conclusion. What is one other possible explanation for the observed difference between the percentages of patients that are pain free 24 hours after receiving acupuncture in the two groups?